首页> 外文OA文献 >Sparse K-Means with $\ell_{\infty}/\ell_0$ Penalty for High-Dimensional Data Clustering

【2h】

Sparse K-Means with $\ell_{\infty}/\ell_0$ Penalty for High-Dimensional Data Clustering

机译：稀疏K-means与$ \ ell _ {\ infty} / \ ell_0 $ penalty for High-Dimensional 数据聚类

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sparse clustering, which aims to find a proper partition of an extremelyhigh-dimensional data set with redundant noise features, has been attractedmore and more interests in recent years. The existing studies commonly solvethe problem in a framework of maximizing the weighted feature contributionssubject to a $\ell_2/\ell_1$ penalty. Nevertheless, this framework has twoserious drawbacks: One is that the solution of the framework unavoidablyinvolves a considerable portion of redundant noise features in many situations,and the other is that the framework neither offers intuitive explanations onwhy this framework can select relevant features nor leads to any theoreticalguarantee for feature selection consistency. In this article, we attempt to overcome those drawbacks through developing anew sparse clustering framework which uses a $\ell_{\infty}/\ell_0$ penalty.First, we introduce new concepts on optimal partitions and noise features forthe high-dimensional data clustering problems, based on which the previouslyknown framework can be intuitively explained in principle. Then, we apply thesuggested $\ell_{\infty}/\ell_0$ framework to formulate a new sparse k-meansmodel with the $\ell_{\infty}/\ell_0$ penalty ($\ell_0$-k-means for short). Wepropose an efficient iterative algorithm for solving the $\ell_0$-k-means. Todeeply understand the behavior of $\ell_0$-k-means, we prove that the solutionyielded by the $\ell_0$-k-means algorithm has feature selection consistencywhenever the data matrix is generated from a high-dimensional Gaussian mixturemodel. Finally, we provide experiments with both synthetic data and the AllenDeveloping Mouse Brain Atlas data to support that the proposed $\ell_0$-k-meansexhibits better noise feature detection capacity over the previously knownsparse k-means with the $\ell_2/\ell_1$ penalty ($\ell_1$-k-means for short).

机译：稀疏聚类旨在寻找具有冗余噪声特征的超高维数据集的适当分区，近年来受到越来越多的关注。现有研究通常在最大化加权特征贡献的框架下解决该问题，该框架受到$ \ ell_2 / \ ell_1 $惩罚。然而，该框架具有两个严重的缺点：一是该框架的解决方案在许多情况下不可避免地涉及相当多的冗余噪声特征，其二是该框架既未提供关于该框架为何可以选择相关特征也无法导致任何结果的直观解释。特征选择一致性的理论保证。在本文中，我们试图通过开发使用$ \ ell _ {\ infty} / \ ell_0 $惩罚的新的稀疏群集框架来克服这些缺点。首先，我们为高维数据群集引入有关最佳分区和噪声特征的新概念。问题，基于这些问题原则上可以直观地解释。然后，我们采用建议的$ \ ell _ {\ infty} / \ ell_0 $框架来制定新的稀疏k-means模型，并采用$ \ ell _ {\ infty} / \ ell_0 $罚金（简称$ \ ell_0 $ -k-means ）。我们提出了一种有效的迭代算法来求解$ \ ell_0 $ -k-means。为了深入了解$ \ ell_0 $ -k-means的行为，我们证明了当从高维高斯混合模型生成数据矩阵时，$ \ ell_0 $ -k-means算法产生的解决方案具有特征选择一致性。最后，我们提供了包含合成数据和AllenDeveloping小鼠脑图集数据的实验，以支持与先前已知的$ \ ell_2 / \ ell_1 $稀疏k均值相比，拟议的$ \ ell_0 $ -k-means具有更好的噪声特征检测能力。罚款（简称$ \ ell_1 $ -k-均值）。

著录项

作者
Chang, Xiangyu; Wang, Yu; Li, Rongjian; Xu, Zongben;
展开▼
作者单位

展开▼
年度 2014
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. SPARSE k-MEANS WITH l(infinity)/l(0) PENALTY FOR HIGH-DIMENSIONAL DATA CLUSTERING [J] . Chang Xiangyu, Wang Yu, Li Rongjian, Statistica Sinica . 2018,第3期

机译：具有L（Infinity）/ L（0）高维数据聚类的罚款的稀疏k均值
2. Robust and sparse k-means clustering for high-dimensional data [J] . Brodinova Sarka, Filzmoser Peter, Ortner Thomas, Advances in data analysis and classification . 2019,第4期

机译：用于高维数据的鲁棒和稀疏k均值聚类
3. An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data [J] . Jing Liping, Ng Michael K., Huang Joshua Zhexue IEEE Transactions on Knowledge and Data Engineering . 2007,第8期

机译：高维稀疏数据子空间聚类的熵权k均值算法
4. Sparse K-Means with the l_q(0leq q< 1) Constraint for High-Dimensional Data Clustering [C] . Yu Wang, Xiangyu Chang, Rongjian Li, IEEE International Conference on Data Mining . 2013

机译：具有l_q（0leq q <1）约束的稀疏K均值用于高维数据聚类
5. Visual data mining: Using parallel coordinate plots with K-means clustering and color to find correlations in a multidimensional dataset. [D] . Peterson, Angela R. 2009

机译：可视数据挖掘：使用具有K均值聚类和颜色的平行坐标图来查找多维数据集中的相关性。
6. Penalized logistic regression based on ... formula ... penalty for high-dimensional DNA methylation data [O] . Hong-Kun Jiang, Yong Liang -1

机译：基于...公式...罚分的高维DNA甲基化数据的惩罚逻辑回归
7. Dynamic Sparse Subspace Clustering for Evolving High-Dimensional Data Streams [O] . Jinping Sui, Zhen Liu, Li Liu, 2020

机译：动态稀疏子空间聚类，用于不断发展的高维数据流

Sparse K-Means with $\ell_{\infty}/\ell_0$ Penalty for High-Dimensional Data Clustering

摘要

著录项

相似文献

相关主题

期刊订阅